Port4NooJ v3.0: Integrated Linguistic Resources for Portuguese NLP

نویسندگان

  • Cristina Mota
  • Paula Carvalho
  • Anabela Barreiro
چکیده

This paper introduces Port4NooJ v3.0, the latest version of the Portuguese module for NooJ, highlights its main features, and details its three main new components: (i) a lexicon-grammar based dictionary of 5,177 human intransitive adjectives, and a set of local grammars that use the distributional properties of those adjectives for paraphrasing (ii) a polarity dictionary with 9,031 entries for sentiment analysis, and (iii) a set of priority dictionaries and local grammars for named entity recognition. These new components were derived and/or adapted from publicly available resources. The Port4NooJ v3.0 resource is innovative in terms of the specificity of the linguistic knowledge it incorporates. The dictionary is bilingual Portuguese-English, and the semantico-syntactic information assigned to each entry validates the linguistic relation between the terms in both languages. These characteristics, which cannot be found in any other public resource for Portuguese, make it a valuable resource for translation and paraphrasing. The paper presents the current statistics and describes the different complementary and synergic components and integration efforts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Introducing the Reference Corpus of Contemporary Portuguese Online

We present our work in processing the Reference Corpus of Contemporary Portuguese and its publication online. After discussing how the corpus was built and our choice of meta-data, we turn to the processes and tools involved for the cleaning, preparation and annotation to make the corpus suitable for linguistic inquiries. The Web platform is described, and we show examples of linguistic resourc...

متن کامل

The TermiNet Project: an Overview

Linguistic resources with domain-specific coverage are crucial for the development of concrete Natural Language Processing (NLP) systems. In this paper we give a global introduction to the ongoing (since 2009) TermiNet project, whose aims are to instantiate a generic NLP methodology for the development of terminological wordnets and to apply the instantiated methodology for building a terminolo...

متن کامل

Portuguese Large-scale Language Resources for NLP Applications

The paper describes Portuguese large-scale linguistic resources, mainly computational lexicons and grammars, developed by LabEL. These resources are formalized and applied to texts by means of finite-state techniques, more and more acknowledged in Natural Language Processing. On the one hand, it illustrates methods on lexical representation for simple words and multi-word expressions; on the ot...

متن کامل

OAL: A NLP Architecture to Improve the Development of Linguistic Resources for NLP

The performance of most NLP applications relies upon the quality of linguistic resources. The creation, maintenance and enrichment of those resources are a labour-intensive task. In this paper we present the NLP architecture OAL, designed to assist computational linguists in the whole process of the development of resources in an industrial context: from corpora compilation to quality assurance...

متن کامل

: A NLP Architecture to Improve the Development of Linguistic Resources for NLP

The performance of most NLP applications relies upon the quality of linguistic resources. The creation, maintenance and enrichment of those resources are a labour-intensive task. In this paper we present the NLP architecture OAL, designed to assist computational linguists in the whole process of the development of resources in an industrial context: from corpora compilation to quality assurance...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016